AND

System design

Mobile Architecture Guide

A senior-level tour of mobile system design mapped onto modern Android — Compose, MVVM/MVI, Clean Architecture, and offline-first. This is the “how do you think about building an Android app at scale” material.

01 · The mobile system-design framework

Use a repeatable spine so you don't freeze: Clarify → High-level design → Deep dive → Trade-offs → Summary.

  • Clarify for a few minutes first: functional requirements, scale, offline expectations, real-time needs, min SDK, and which screens matter. Resist designing until you've scoped.
  • HLD: sketch the layers — UI (Compose + ViewModel), domain (use-cases), data (repository + remote/local sources).
  • Deep dive one component they care about (e.g. the sync engine or the feed).
  • Trade-offs: say them out loud — every choice costs something.
  • Summarize and handle follow-ups.
Senior tell Clarify before designing, and narrate the trade-off on every decision. Interviewers grade your process more than the final boxes-and-arrows.

02 · Layered architecture & unidirectional data flow

Three layers, dependencies pointing inward:

  • UI layer — Composables render an immutable UiState; the ViewModel exposes it as StateFlow and receives events as function calls.
  • Domain layer (optional) — pure-Kotlin use-cases holding business rules, no Android dependencies, the most testable layer.
  • Data layer — repositories own a single source of truth and coordinate remote (Retrofit) + local (Room/DataStore) sources, mapping DTOs to domain models.

Unidirectional data flow: state flows up from the data layer as Flows; events flow down as calls. The UI is a function of state, so it's predictable and testable.

03 · The networking layer

A robust client separates concerns: OkHttp (engine, pool, cache, interceptors) under Retrofit (typed API) with Moshi/kotlinx.serialization for JSON.

  • Interceptors: application-level for auth headers and logging; network-level for cache-control. An Authenticator handles 401 token refresh in one place.
  • Result modeling: wrap calls in a sealed success/error type; never let raw HttpException/IOException reach Compose.
  • Resilience: sane timeouts, retry-with-backoff on idempotent calls only, and cancellation tied to the caller's coroutine scope.
Trade-off A normalized cache (per-entity) reflects one update everywhere but adds complexity; a simpler per-request cache plus Room as the source of truth is usually enough on mobile.

04 · Storage, caching & invalidation

Pick storage by shape: Room for structured/relational data and reactive reads; DataStore for key-value/typed preferences; EncryptedSharedPreferences/Keystore for secrets; files/MediaStore for blobs.

  • Cache eviction (free space): LRU or TTL. Cache invalidation (correctness): time-based, event-based (a mutation busts a key), or version-based.
  • Stale-while-revalidate: show cached data instantly, refresh in the background, reconcile — the reason good apps feel fast.
  • Room DAOs returning Flow make the cache reactive: write once, every observer updates.

05 · Offline-first & sync

The local database is the source of truth; the network only syncs it. The UI always reads local data, so it works with no signal.

  • Optimistic updates: apply the change locally and render immediately; keep the previous value to roll back on failure.
  • Sync queue: enqueue offline mutations and replay them (via WorkManager) when connectivity returns, with idempotency keys.
  • Conflict resolution: choose per data type — last-write-wins, server-wins, or a merge. Name the policy explicitly.
Trade-off Optimistic UX feels instant but needs rollback and conflict handling; for money/inventory you may prefer pessimistic confirmation. Say which and why.

06 · Real-time — WebSocket, SSE, polling & push

Match the transport to the need and the battery:

  • WebSocket for two-way, low-latency (chat, presence). Manage reconnect/backoff and lifecycle.
  • SSE / long-poll for one-way server→client streams (feeds, live scores).
  • Polling for simple, low-frequency updates without socket infra.
  • FCM push when the app is backgrounded or killed — the only reliable way to wake it.
Trade-off Holding a socket open for occasional updates drains battery (radio tail energy). Don't keep a live connection when periodic push or polling would do.

07 · Pagination at scale (Paging 3)

For large lists, page lazily. Paging 3 provides a PagingSource (or RemoteMediator for network+DB), exposes Flow<PagingData>, and handles load states and retries.

  • Cursor/keyset pagination is the right default for feeds — stable under inserts, unlike offset/limit.
  • RemoteMediator implements offline-first paging: page from Room, fetch + persist the next page from the network.
  • Collect with collectAsLazyPagingItems() in Compose and render placeholders + append-load spinners.

08 · Performance & security essentials

Bake both in from the start:

  • Performance: main-thread discipline, Compose stability, Baseline Profiles, and Macrobenchmark gates in CI. Measure cold start and jank on release builds.
  • Security: Keystore for keys, EncryptedSharedPreferences for secrets, Network Security Config + pinning in transit, Play Integrity for attestation, R8 for obfuscation.
  • Observability: crash reporting (crash-free rate as the headline metric), ANR tracking, and release-health gating on staged rollouts.
Senior tell Treat a deploy as “done” only when the crash-free rate holds across the rollout — monitoring is part of shipping.

Deep dives

The senior playbook in a concept → example → problem → solution shape, so each idea sticks as a real engineering decision rather than a definition.

State

Lifecycle-safe Flow collection

Concept: collecting a Flow keeps the upstream working as long as the collector is active.

Problem: collecting in onCreate with a bare lifecycleScope.launch keeps collecting in the background — wasted work and stale UI updates.

Solution: bound collection to the lifecycle.

// Compose val ui by viewModel.uiState.collectAsStateWithLifecycle() // Views lifecycleScope.launch { repeatOnLifecycle(Lifecycle.State.STARTED) { viewModel.uiState.collect { render(it) } } }

Collection now stops below STARTED and restarts on return — no background churn.

State

StateFlow caching with stateIn(WhileSubscribed)

Concept: a cold Flow restarts per collector; you want one shared, cached state stream.

Problem: on rotation the subscriber briefly drops, and a naive share tears down and re-fetches.

Solution: share with a stop-timeout so a config change doesn't restart the upstream.

val uiState: StateFlow<UiState> = repository.observeItems() .map { UiState(items = it) } .stateIn( scope = viewModelScope, started = SharingStarted.WhileSubscribed(5_000), initialValue = UiState(loading = true), )

The 5s window keeps the cached value across rotation while still stopping work when the screen is truly gone.

Flow

Search with flatMapLatest + debounce

Concept: a search box should cancel the previous query when the user keeps typing.

Problem: firing a request per keystroke wastes the network and can render stale results out of order.

Solution: debounce the query and switch to the latest with flatMapLatest.

val results = queryFlow .debounce(300) .distinctUntilChanged() .flatMapLatest { q -> if (q.isBlank()) flowOf(emptyList()) else repository.search(q) // cancelled when q changes } .flowOn(Dispatchers.Default)

flatMapLatest cancels the in-flight inner flow the moment a new query arrives — no race, no stale list.

Compose

Fixing over-recomposition with stability

Concept: Compose skips a composable only when its parameters are stable and unchanged.

Problem: a list item takes items: List<Item>; List is treated as unstable, so every item recomposes on any change.

Solution: use an immutable collection (or annotate the model) so Compose can skip.

// build.gradle: implementation("org.jetbrains.kotlinx:kotlinx-collections-immutable:...") @Immutable data class Item(val id: String, val title: String) @Composable fun Feed(items: ImmutableList<Item>) { LazyColumn { items(items, key = { it.id }) { ItemRow(it) } } }

Verify with Layout Inspector recomposition counts or the Compose compiler metrics report.

Concurrency

Main-safety with withContext

Concept: a suspend function should be safe to call from the main thread.

Problem: doing blocking I/O (disk, JSON parse) on Dispatchers.Main drops frames or triggers an ANR.

Solution: switch dispatchers inside the function, and inject the dispatcher for testability.

class Repo(private val io: CoroutineDispatcher = Dispatchers.IO) { suspend fun load(id: String): Model = withContext(io) { val dto = api.fetch(id) // blocking-safe on IO dto.toModel() } } // test: Repo(StandardTestDispatcher())

Callers stay on Main; the function owns its threading. Injecting io lets tests control it.

Jetpack

Guaranteed background work with WorkManager

Concept: some work must complete even across process death and reboot.

Problem: a coroutine in viewModelScope dies with the screen; a raw Service is heavy and fragile for a deferrable sync.

Solution: enqueue a constrained CoroutineWorker.

class SyncWorker(c: Context, p: WorkerParameters) : CoroutineWorker(c, p) { override suspend fun doWork(): Result = try { repo.sync(); Result.success() } catch (e: IOException) { Result.retry() } } val req = OneTimeWorkRequestBuilder<SyncWorker>() .setConstraints(Constraints(requiredNetworkType = NetworkType.CONNECTED)) .build() WorkManager.getInstance(ctx) .enqueueUniqueWork("sync", ExistingWorkPolicy.KEEP, req)
DI

Hilt scopes & @Binds vs @Provides

Concept: Hilt generates a dependency graph tied to Android lifecycles.

Problem: you need an interface→impl binding and a third-party object you don't own, with the right lifetimes.

Solution: @Binds for interfaces, @Provides for owned-elsewhere types, scoped correctly.

@Module @InstallIn(SingletonComponent::class) abstract class DataModule { @Binds @Singleton abstract fun bindRepo(impl: RepoImpl): Repo companion object { @Provides @Singleton fun retrofit(): Retrofit = Retrofit.Builder() .baseUrl(BASE).addConverterFactory(/* ... */).build() } } @HiltViewModel class FeedViewModel @Inject constructor(private val repo: Repo) : ViewModel()

A missing binding is a build error, not a runtime crash — a real advantage to name.

Data

Room as the single source of truth

Concept: the UI observes the database; the network only updates it.

Problem: fetching directly into the UI gives spinners everywhere and divergent copies of the data.

Solution: read from a Flow DAO; refresh writes back to Room, which re-emits.

@Dao interface ItemDao { @Query("SELECT * FROM items ORDER BY updatedAt DESC") fun observeAll(): Flow<List<ItemEntity>> @Upsert suspend fun upsertAll(items: List<ItemEntity>) } fun observeItems(): Flow<List<Item>> = dao.observeAll().map { it.toDomain() } suspend fun refresh() { dao.upsertAll(api.fetch().toEntities()) }

The screen renders instantly from cache and updates the moment refresh() writes — stale-while-revalidate, built in.

Security

Certificate pinning (and its rotation risk)

Concept: pinning trusts only your server's certificate/public key, blocking man-in-the-middle via a rogue CA.

Problem: if you pin a single cert and rotate it, every old client breaks.

Solution: pin with OkHttp, include a backup pin, and ship pins ahead of rotation.

val pinner = CertificatePinner.Builder() .add("api.example.com", "sha256/AAAA…") // current .add("api.example.com", "sha256/BBBB…") // backup / next .build() val client = OkHttpClient.Builder().certificatePinner(pinner).build()

Prefer pinning the SPKI (public key) over the leaf cert, and always keep a backup pin to survive rotation.

Performance

Baseline Profiles for cold start

Concept: a Baseline Profile lists hot code so ART AOT-compiles it at install instead of interpreting on first run.

Problem: the first launch and first scroll are slow because that code is JIT-compiled cold.

Solution: generate a profile with a Macrobenchmark journey and ship it.

@Test fun generate() = baselineRule.collect( packageName = "com.example.app", ) { startActivityAndWait() // scroll the feed so the hot path is captured device.findObject(By.res("feed")).fling(Direction.DOWN) }

Measure the before/after with a separate Macrobenchmark and keep it in CI so the win can't regress.