πŸ“– Bad OTP β€” Data & Lineage

← Dashboard

How the two OTP tables behind this dashboard are built. server = Cost tabclient = User-Behavior tab

The two tables cannot be joined β€” the server table has sms_session_id but no trace_id; the client table has otp_trace_id but no sms_session_id. That's why dashboard filters are split per tab (Country/Channel drive server, Platform/Version drive client).

β‘  dwd_botim_logs_server_otp_session_di server Β· 122 days

One row = one carrier SMS send, enriched with its verification outcome. Powers the Bad OTP / Cost tab and the supplier analysis. Hive EXTERNAL Parquet, partitioned by dt (yyyyMMdd), at obs_abucc.db/…; queried via Grafana StarRocks (fevnxapb1i0hsf).

Lineage β€” LEFT JOIN two server log streams on sms_session_id

β”Œβ”€ carrier-send log ─────────────┐ β”Œβ”€ VerifyAuthCode log ───────────┐ β”‚ sms_session_id (uuid) β”‚ β”‚ sms_session_id β”‚ β”‚ carrier_ts / ptype / price β”‚ LEFT β”‚ verify_ts / verify_reason β”‚ β”‚ to_number / purpose β”‚ JOIN β”‚ device_type / country_code β”‚ β”‚ is_carrier_success β”‚ ──────► β”‚ is_verify_success β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ on β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ sms_session_id β–Ό dwd_botim_logs_server_otp_session_di (1 row / carrier send)

Reconstructed build HQL (daily)

-- source names are pseudo; real ODS/log tables live in the pipeline
INSERT OVERWRITE TABLE obs_abucc.dwd_botim_logs_server_otp_session_di PARTITION (dt)
SELECT
  s.sms_session_id, s.carrier_ts, s.ptype, s.price,
  CASE WHEN s.price IS NOT NULL THEN 1 ELSE 0 END  AS is_carrier_success,
  s.to_number, s.purpose,                              -- 1=SIGNUP 2=VERIFY 8=PAYBY_NOTIFY
  v.verify_ts, v.verify_reason, v.device_type, v.country_code, v.verify_device_key,
  CASE WHEN v.verify_reason = 'success' THEN 1 ELSE 0 END AS is_verify_success,
  v.verify_ts - s.carrier_ts AS carrier_to_verify_ms, s.dt
FROM ods_carrier_send_log s
LEFT JOIN ods_verify_authcode_log v ON s.sms_session_id = v.sms_session_id AND v.dt = s.dt
WHERE s.dt = '${bizdate}';

Column dictionary

columnmeaning
sms_session_idSMS session UUID β€” join key to VerifyAuthCode
ptypevendor: ks=Etisalat, ys=BytePlus, rs=Transland, cm=CM.com
priceUSD; present β‡’ carrier success, NULL β‡’ failure
purpose1=SIGNUP Β· 2=VERIFY Β· 8=PAYBY_NOTIFY (payment notify, NOT OTP)
verify_reasonsuccess / wrongCode / authCodeExpired / fingerprintRisk …
device_type0=Android, 1=iOS (present on success only)
is_verify_success1 = verified, 0 = failed/never
carrier_to_verify_mssend β†’ verify duration (ms)
Gotchas: purpose=8 is payment-notification SMS, not OTP β†’ for OTP analysis filter purpose IN (1,2). Rates use denominator = total sends. Valid history starts 2026-02-09 (earlier days had verify unlogged = false 100% fail).

β‘‘ zhifeng_otp_event_flatten_di client Β· 1 day (2026-06-28)

One row = one client UI-track event (OTP Sent / Entered / Verification SuccessΒ·Failed / Block / Page State). Powers the User Behavior tab β€” the "why users drop off" side the server can't see. Impala Parquet, partition dt.

Source & parsing trick

Source vew.logs_footprint_extra is a firehose: a single json string + dt. Inside, name/when/baseInfo are real nested objects, but payload is a stringified JSON with space-containing keys:

field grouptechnique
payload.* (event, Otp Trace Id, Input Method, Time Stamp)regexp_extract(pl,'"Key":"([^"]*)"',1) on the payload string
name / when / baseInfo.*get_json_object(json,'$.baseInfo.x')

Build flow

vew.logs_footprint_extra (dt) -- raw json string + dt β”‚ pl = get_json_object(json,'$.payload') β–Ό filter pl event LIKE 'OTP %' β”‚ regexp_extract payload params + get_json_object baseInfo fields β–Ό INSERT OVERWRITE PARTITION (dt='20260628') β–Ό vew.zhifeng_otp_event_flatten_di -- 1 row / client OTP event

Impala specifics: STORED AS PARQUET (can't write ORC); from_unixtime(CAST(CAST(ts/1000 AS BIGINT) …)) because from_unixtime needs BIGINT. Full script in the repo (analysis/otp_flatten.hql).

Gotchas (all bit us):

Server vs client β€” the pair

server (Cost tab)client (User-Behavior tab)
grain1 carrier SMS send1 UI-track event
built fromcarrier-send βŸ• VerifyAuthCode on sms_session_idparsed logs_footprint_extra payload JSON
join keysms_session_id (no trace_id)otp_trace_id (no sms_session_id)
history122 days (from 2026-02-09)1 day (2026-06-28)
answersdelivery Β· vendor Β· cost Β· fail typefunnel Β· blocks Β· auto-fill Β· latency Β· platform
device_type0=Android, 1=iOS0=iOS, 1=Android